{ "cells": [ { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "# Python variables - behind the scenes" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We will now examine how Python stores objects in memory, and the link between variables and memory location. You might be wondering why you need to worry about this, but it is actually essential to understand this in order to make best use of Python's capabilities and avoid mistakes/bugs." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Assignment and modification" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Consider the following two examples. First:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 2\n", "b = a\n", "print(a, b)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 4\n", "print(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This should hopefully make sense so far." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now consider the following example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = [2, 3, 4]\n", "b = a\n", "a.append(5)\n", "print(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In this case, modifying ``a`` modified ``b`` too! This is not as intutitive... But if we do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 9\n", "print(a, b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "This time, changing ``a`` did not change ``b`` - what is happening?" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The key is to understand that executing:\n", " \n", " variable = something\n", " \n", "will change which object ``variable`` is pointing to in memory (**assignment**). Contrarily, when calling a method with:\n", "\n", " variable.method()\n", "\n", "some (but not all) methods will modify the variable **in-place** (more information below)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Let's go over the examples above but this time with a graphical representation, where the yellow circles show the **variables**, and the blue rectangles show the **objects in memory**. If we do:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 2\n", "b = a\n", "a = 4" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "then what happens is the following." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "First, for ``a = 2`` we create space in memory for the value ``2`` and we assign that location in memory to the variable ``a``:\n", "\n", "![ex1_1](http://wwwstaff.ari.uni-heidelberg.de/fschneider/teaching/py4sci/graphics/ex1_1.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "For ``b = a``, we are now assigning the variable ``b`` to point to the same object as ``a``:\n", "\n", "![ex1_2](http://wwwstaff.ari.uni-heidelberg.de/fschneider/teaching/py4sci/graphics/ex1_2.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "And finally for ``a = 4``, we re-assign ``a`` to point to a different place in memory (containing ``4``) but ``b`` still points to the same object (``2``):\n", "\n", "![ex1_3](http://wwwstaff.ari.uni-heidelberg.de/fschneider/teaching/py4sci/graphics/ex1_3.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Now if we follow the same logic for the second example:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = [2, 3, 4]\n", "b = a\n", "a.append(5)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "we again start off by creating space in memory for the list ``[2, 3, 4]``, then we point the variable ``a`` to that location.\n", "\n", "![ex2_1](http://wwwstaff.ari.uni-heidelberg.de/fschneider/teaching/py4sci/graphics/ex2_1.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "For ``b = a``, we then point ``b`` to the same location as ``a``. It is important to understand that **the list exists only once in memory**:\n", "\n", "![ex2_2](http://wwwstaff.ari.uni-heidelberg.de/fschneider/teaching/py4sci/graphics/ex2_2.png)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "We now **modify, in-place,** the object that ``a`` is pointing to by ``a.append(5)`` - the concept of modifying the object is very important - we are not creating a new list, it is still in the same place in memory, even if it has one extra element now:\n", "\n", "![ex2_3](http://wwwstaff.ari.uni-heidelberg.de/fschneider/teaching/py4sci/graphics/ex2_3.png)\n", "\n", "This means that because ``b`` is pointing to the same place in memory, it will also see a list with (now) four elements!" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Then, by setting ``a = 9``, we are re-assigning ``a`` to point to a region in memory with the value ``9``:\n", "\n", "![ex2_4](http://wwwstaff.ari.uni-heidelberg.de/fschneider/teaching/py4sci/graphics/ex2_4.png)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In order to talk about this behavior, we use the terms **copying** and **referencing**. When we do:\n", "\n", " variable = something\n", "\n", "then the **value** is actually created when writing ``something``. The assignment merely creates a pointer (“reference” is just a fancy name for that) from a name to that value/object in memory, and you could have more such names pointing to the same something." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Another important aspect is that whatever is on the right-hand side is evaluated first, and will (conceptually) result in the creation of a new object unless the ``something`` is already a reference (in which case ``variable`` and ``something`` will just refer to the same value. In the following cases, ``something`` is a \"literal\" (i.e. the representation of a value in the source code), and a new value in memory is created:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 2\n", "b = a + 1\n", "c = b * 2\n", "print(a, b, c)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the following second assignment, ``something`` is a reference, and hence no new object is created:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = [2,3,4]\n", "b = a # b points to the same object as a" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In case you're uncertain at some point, there's python's built-in ``id`` function that tells you the identity of its argument:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "id(a), id(b), id(c)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Copying" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In some cases, the behavior described above is not desirable, and we want to make a true copy, not just a reference, *because we want to change* ``b`` *without changing* ``a``:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from copy import deepcopy\n", "a = [2,3,4]\n", "b = deepcopy(a)\n", "a.append(5)\n", "print(a, b)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "id(a), id(b)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The ``copy`` module contains a function ``copy``, too. If you want to really understand what's going on, it will probably help to create a nested list (as in ``[range(2), range(3)]``), copy that and manipulate the inner lists.\n", "\n", "Note that slicing (usually) creates a copy, too (careful with numpy arrays, though), which is why in quite a bit of source code you see slices when a copy is desired:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = range(4)\n", "b = a[:]\n", "id(a), id(b)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Methods" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "As mentioned above, some *methods* modify an object **in-place**:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = [1,2,3]\n", "a.append(5) # modifies ``a``" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "and some will return a copy rather than modifying the object." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "s = 'hello'\n", "s2 = s.upper() # returns a copy of the string in uppercase without modifying s\n", "id(s), id(s2)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It should be clear from the documentation (e.g. ``s.upper?``) how a particular method behaves." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Mutable vs immutable objects" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Some objects are **immutable**, which means that they cannot be modified - examples include ``float``, ``int`` and ``str``. For instance, when doing:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = 1.\n", "print(id(a))\n", "a = 2. \n", "print(id(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "In the second line, a new location in memory is created for ``2.``, and ``a`` points to that object, not to ``1.`` (in other words, the float is not being changed, it is ``a`` that is pointing to a different object)." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Contrarily, ``list``, ``dict`` and Numpy arrays are **mutable**, which means the object can be modified:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "a = [1,2,3]\n", "print(id(a))\n", "a.append(5)\n", "print(id(a))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "After the second line, ``a`` still points at the same list, but the list has now been modified." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Functions" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "A final but important point is that when passing variables to functions, variables are passed as references, so:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def do(x):\n", " x.append(1)\n", " \n", "a = [1,2]\n", "do(a)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "The following, however, just changes the value ``x`` inside the function ``do`` and thus has no effect outside of ``do`` (because we create a local variable ``x`` that is only valid within ``do``):" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "def do(x):\n", " x = 0 # re-assigns x to 0, but only in the function\n", "\n", "a = [1,2]\n", "do(a)\n", "print(a)" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "## Copying and Referencing Numpy arrays" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "With Numpy arrays, one has to be particularly careful with copying and referencing. With a few exceptions (and superficially contrary to the behaviour of almost all other python objects), most slicing/masking operations in Numpy indicate **references**, not copies, to the data. This greatly enhances the speed, efficiency and memory usage of code." ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import numpy as np" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "y = x\n", "y[3] = 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "This is similar to lists, but now consider the following:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "y = x[::2]\n", "y[3] = 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x, y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "Even though we took a slice with a given start, end, and step-size, the resulting array is still just a reference, or **view**, of the original array! (note that for lists, ``x[::2]`` returns a copy!). This can be very handy when combined with masking:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "x[x < 5] = 0.\n", "x" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "There is one exception to the referencing, which is:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "y = x[[1,3,2,2]] # returns a new array, not a view\n", "y[0] = 9\n", "x, y" ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "As before, you can explore this further to understand in what cases references or copies are made. However, be aware that the ``id`` of a view *will* be different from the original array, even though the view is actually pointing to a subset of the original array." ] }, { "cell_type": "markdown", "metadata": { "slideshow": { "slide_type": "slide" } }, "source": [ "In the case of Numpy arrays, one can force a copy by:" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x = np.arange(10)\n", "y = x.copy()\n", "y[0] = -1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "x" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "y" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Before you start cursing the numpy authors because it might seem they were out to confuse you: They did this because very common operations become very fast in this way, and in practice that's much less of a trap than you may suspect." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Exercise" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The following questions are just to test your understanding of the variable assignment - you don't need to write any code - just try and think of what the output will be, then you can try it out to check if you got it right:\n", "\n", "What will ``a`` be after the following?\n", "\n", " a = [1, 3., [1, 2, 3], 'hello']\n", " b = a[0]\n", " b = 4.\n", "\n", "What will ``c`` be after the following?\n", "\n", " c = [1, 3., [1, 2, 3], 'hello']\n", " d = c[2]\n", " d.append(8)\n", "\n", "What will ``e`` be after the following?\n", "\n", " e = [1, 3., [1, 2, 3], 'hello']\n", " f = e[2]\n", " f = [1, 2]\n", "\n", "What will ``g`` be after the following?\n", "\n", " g = [1, 2, 3, 4]\n", " h = g[::2]\n", " h[0] = 9\n", "\n", "What will ``i`` be after the following?\n", "\n", " import numpy as np\n", " i = np.array([1, 2, 3, 4])\n", " j = i[::2]\n", " j[0] = 9\n", " \n", "What will ``matrix`` be after the following? (*Hint:* What does ``[0]*2`` do in the first place?)\n", "\n", " matrix = [[0]*2]*3\n", " matrix[0][0] = 1" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "# You can try the above code here." ] } ], "metadata": { "anaconda-cloud": {}, "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 1 }